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Abstract 


We  propose  a  probabilistic  methodology  for  deducing  a  syndrome  or  syndromes  (possibly 
induced  by  chemical/biological  agents)  associated  with  a  large  number  of  people  from  certain 
geographic  areas  that  have  well-established  diagnoses  and  symptoms.  Here,  using  the  finite 
element  method  and  the  databases  of  symptoms  and  diagnoses,  for  each  geographic  area  an 
analytical  probability  distribution  function  is  established,  which  gives  a  probability  that  a  person 
has  a  certain  number  of  symptoms/diagnoses.  This,  in  turn,  allows  us  to  write  down  an  analytic 
expression  for  the  symptoms/diagnoses  density  from  which,  with  the  help  of  databases,  one 
deduces  the  overall  most  numerous  symptoms  and  diagnoses;  as  such,  they  define  the  syndrome 
for  the  particular  geographic  area.  Now,  comparing  the  syndromes  to  each  other,  one  can  see 
to  what  extent  geography,  and  what  is  on  it,  affects  the  syndromes  associated  with  different 
geographic  areas. 
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1.  INTRODUCTION 


It  happens  quite  often  that  the  problem,  whose  resolution  one  pursues  inductively,  will  yield  a  large 
number  of  possible  solutions,  which  are  only  vaguely  related.  Then,  of  course,  one  is  faced  with  a  new 
problem:  Which  of  these  solutions  represents  the  one  that  is  realistic  for  the  situation  at  hand?  A  possible 
way  to  circumvent  this  dilemma  is  to  start  deductively  rather  than  inductively  fi'om  the  very  beginning,  which, 
by  its  very  nature,  should  lead  to  just  one  solution  of  the  problem  at  hand.  Such  a  situation,  we  believe,  is 
found  when  one  is  trying  to  deduce  a  syndrome  fi’om  a  collection  of  symptoms  and  diagnoses  for  a  large 
number  of  people  that  were  confined  to  a  specific  geographic  area.  Here,  we  are  inspired  by  the  so-called 
Gulf  War  and  reported  cases  of  symptoms  of  the  U.S.  military  persormel  that  were  acquired  during  the  Iraq- 
Kuwait  War  of  1991.  The  idea  here  is  to  have  a  deductive  methodology  to  deal  with  tiiis  type  of  situation 
in  the  future. 

We  believe  that  the  probabilistic  methodology  we  will  describe  should  be  able  to  yield  syndromes  that 
one  could  uniformly  associate  with  particular  geographies  or  particular  medical  preventive  procedures  (e.g., 
inoculation).  As  such,  it  represents  a  deductive  approach  for  finding  out  the  syndrome  or  syndromes,  if  there 
are  more  than  one. 

What  we  intend  to  do  here  is  to  give  the  mathematical  aspects  of  the  probability  distribution  function  of 
symptoms/diagnoses  for  a  soldier  associated  with  some  militaiy  unit  in  a  particular  geographic  location.  The 
importance  of  this  distribution  function  lies  in  the  fact  that  all  the  soldiers  are  now  treated  "equally"  in  a  sense 
that  the  distribution  function  tells  us  what  the  probability  is  for  any  soldier  to  have,  say,  five 
symptoms/diagnoses  of  any  kind.  Of  course,  the  data,  which  are  compiled  and  will  be  available  fi'om  the 
U.S.  Army  and  Joint  Services  Environmental  Support  Group  and  the  Veterans  Administration  Epidemiology 
Services,  will  make  these  distribution  fimctions  relevant  for  particular  units  and  particular  geographic 
locations. 

Needless  to  say,  the  methodology  that  employs  the  probabilistic  formulation  should  also  be  applicable 
to  other  situations  where  a  large  number  of  people  from  a  certain  geographic  area  have  well-established 
illnesses  and  symptoms.  As  such,  it  may  go  beyond  the  military  applications  and,  hopefully,  establish  itself 
as  a  useful  tool  in  preventing  the  spread  of  illnesses  due  to  abuses  of  the  environment. 
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In  the  first  part  of  section  2,  we  give  a  simplified  description  of  necessary  data,  i.e.,  symptoms,  diagnoses, 
syndromes,  unit  identification  code  (UIC),  Julian  date  (J-date),  and  the  military  grid  reference  system 
(MGRS).  The  second  part  of  section  2  is  devoted  to  casting  these  data  into  a  simplified  form  suitable  for 
using  them  in  formulations  of  probability  distribution  functions.  Section  3  is  devoted  to  the  probabilistic 
formulation  of  the  syndrome  with  the  help  of  finite  element  method.  Discussion,  recommendations,  and 
conclusion  are  in  section  4. 

2.  THE  DATA 

No  matter  which  way  we  go  to  deduce  a  syndrome,  one  needs  the  data  that,  at  the  end,  will  make  any 
formulation  more  or  less  useful.  In  our  case,  the  data,  of  course,  should  contain  both  medical  and  personnel 
records.  Here  we  are  giving  a  brief  description  of  these  data,  which,  when  acquired,  should  be  easy  to 
incorporate  into  the  described  methodology.  Examples  of  syndrome  descriptions  that  we  finally  present,  of 
course,  are  worked  out  with  fake  data  merely  for  the  sake  of  illustrating  file  probabilistic  formulation  of 
syndromes. 

2.1  Data  Terminology.  Here  we  shall  familiarize  ourselves  with  the  terminology  of  data  that  come  fi'om 
two  main  sources:  The  Veterans  Administration  Epidemiology  Services  (VA-ES)  and  the  U.S.  Army  and 
Joint  Services  Enviromnental  Support  Group  (AJS-ESG).  For  the  sake  of  simplicity,  fi'om  now  on  these  two 
services  will  be  referred  to  simply  as  VA-ES  and  AJS-ESG,  respectively.  We  use  capital  letters  to  describe 
these  data. 

From  VA-ES,  the  data  necessary  to  describe  the  probabilistic  formulation  of  the  syndrome  are: 

IDENTIFICATION  NUMBER  -  A  four-digit  number  identifying  a  person. 

BRANCH  OF  SERVICE  -  Army,  Navy,  Marines,  and  Air  Force. 

THREE  INDEPENDENT  SYMPTOMS  -  Each  symptom  is  characterized  by  a  unique  string  of  digits 
recorded  as  a  code  number.  Before  being  specified,  these  can  be  denoted  as  ICDSYM  1,  ICDSYM  2,  and 
ICDSYM  3. 
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THREE  INDEPENDENT  DIAGNOSES  -  Each  diagnosis  is  characterized  again  by  a  unique  string  of 
digits  recorded  as  a  code  number.  And  again,  before  being  specified,  they  can  be  denoted  as  ICDIAG  1, 
ICDIAG  2,  and  ICDIAG  3. 

Both  symptoms  and  diagnoses  use  the  standard  ICD-9-CM  codes  [1].  In  the  VA  codes,  the  decimal 
points  have  been  entirely  eliminated,  so  the  coded  data  appear  fi'om  001  (for  cholera)  to  the  number  9999; 
hence,  780.7  appears  as  7807,  for  example.  When  ICDSYM  3  =  0  for  a  particular  person,  that  person  has 
only  two  symptoms.  Similarly,  if  for  some  person  ICDIAG  2  =  0  and  ICDIAG  3=0,  that  person  has  only 
one  diagnosis.  As  an  example,  a  complete  symptom  and  diagnosis  description  for  a  person  could  look 
something  like  this: 


ICDSYM  1  =  78999,  ICDSYM  2  =  7807,  ICDSYM  3  =  0; 

ICDIAG  1  =  71940,  ICDIAG  2  =  0,  ICDIAG  3  =  0.  (1) 

When  doing  mathematical  manipulations,  these  notations  would  be  too  cumbersome;  so  we  further  simplify 
these  notations  as: 


Sj  =  ICDSYM  i,  dj  =  ICDIAG  j,  i,j  =  1, 2,  3.  (2) 

Now,  symptoms  and  diagnoses  are  used  together  when  describing  the  illness;  therefore,  to  simplify  matters, 
from  now  on,  we  will  call  symptoms  and  diagnoses  simply  generalized  symptoms.  Furthermore,  we  may  as 
well  put  them  together  in  a  six-component  generalized  symptom/diagnosis  vector: 

4)  =  ((J)^  =  (5,  s^,  d^,  d^,  rf).  (3) 

The  example  from  the  relation  (1)  now  simply  reads  as  (j)  =  (78999, 7807,  0;  71940, 0, 0),  which,  of  course, 
gives  a  complete  description  of  someone's  illness.  Now,  file  question  is  how  to  arrange  these  generalized 
symptoms  into  a  six-component  vector.  As  far  as  our  statistical  model  is  concerned,  this  arrangement  is 
arbitrary.  However,  a  particular  person  putting  together  the  probabilistic  formulation  of  a  syndrome  could 
arrange  them  according  to  their  severities  (i.e.,  Sj  is  the  most  severe  symptom,  while  ^2  is  less,  and  53  the  least 
severe  symptom).  Similarly,  we  do  the  same  with  the  diagnoses — d^,  d2,  and  d^  are  in  the  descending  order 
of  their  severities.  Of  course,  the  person  himself  would  decide  how  to  define  the  severity  degrees  of 
symptoms  and  diagnoses. 
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As  we  see  then,  the  maximum  number  of  generalized  symptoms  that  a  person  can  be  assigned  is  six.  In 
this  connection,  we  introduce  a  continuous  generalized  symptom  number  variable  x;  its  values  are  restricted 
between  0  and  6  for  a  person.  Of  course,  its  values  may  exceed  6  if  we  are  talking  about  a  syndrome  that  may 
in  its  definition  have  more  than  six  generalized  symptoms.  In  any  case,  x=\  means  that  a  person  has  just 
one  generalized  symptom;  this  symptom  may  be  simply  Sj  =  7807  or  <^2  ”  71840  but  not  0,  since  0  means  no 
symptoms. 

UIC  is  a  unit  identification  code.  Unfortunately,  this  code  is  different  for  different  branches  of  the 
Service.  For  the  Army,  the  UIC  is  a  string  of  numbers  preceded  by  a  letter.  In  general,  we  shall  simplify 
discussion  by  skipping  the  codes  and  referring  to  military  units  simply  as  UICl,  UIC2,  etc. 

DATES  -  Refer  to  when  a  person  (with  an  ID  number)  entered  and  exited  the  War  Theater  associated  with 
a  particular  geography  or  geographies.  Both  of  these  dates  are  given  in  MM/DDATY. 

The  VA-ES  data  are  complemented  with  die  data  fi'om  AJS-ESG.  The  ones  that  are  needed  for  the 
probabilistic  treatment  of  the  syndrome  are  again  listed  in  capital  letters: 

UIC  -  It  is  the  same  as  described  by  the  Veterans  Administration  database  (VA-ES). 

JULIAN  DATES  (J-DATE)  -  These  dates  describe  when  some  UICs  entered  and  exited  a  particular 
geographic  area.  The  Julian  date  is  a  character  string  YYDDD.  If  YY  =  91,  then  this  corresponds  to  1991, 
and  DDD  is  a  particular  day  in  1991 .  DDD  runs  from  001  to  365. 

Military  Grid  Reference  System  (MGRS)  [1]  -  This  is  simply  a  military  two-dimensional  coordinate 
frame  given  as  grids.  It  is  designated  with  two  letters  and  two  numbers.  The  number  is  always  associated 
with  the  letter;  so  if  the  letter  is  L,  then  the  number  symbolically  can  be  written  as  Nl.  Hence  the  grid  area 
can  be  written  symbolically  as  LTN^Nj.  The  following  situations  should  be  distinguished:  (1)  j  are 

absent,  the  grid  is  100  km  by  100  km;  (2)  Nl  j  =  0, 1 . 9,  the  grid  is  10  km  by  10  km;  (3)  Nl  j  =  0, ..,  99, 

the  grid  is  1  km  by  1  km;  (4)  Nl  j  =  0, ...,  999,  the  grid  is  100  m  by  100  m;  and  (5)  Nl  j  =  0, ...,  9999,  the 
grid  is  10  m  by  10  m.  The  first  set  of  digits,  Nl  in  our  example,  is  called  easting,  while  the  second  set  of 
digits,  Nj  in  our  example,  is  called  northing;  the  grid  is  determined  from  its  southwest  comer. 
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At  the  National  Institute  of  Health  (NIH)  Conference  in  April  1994,  the  sets  of  generalized  symptoms 
(symptoms  plus  diagnoses)  were  established  for  a  variety  of  syndromes.  The  following  generalized 
symptoms  are  the  ones  that  might  be  the  most  relevant  for  the  probabilistic  syndrome  study: 

(1)  Effects  of  depleted  uranium 

(2)  Effects  of  pesticides 

(3)  Effects  from  petrochemicals  and  petrochemical  smokes 

(4)  Effects  of  nerve  and  mustard  gas 

(5)  Effects  of  Leishmaniasis  infection,  and 

(6)  Effects  of  other  chemicals. 

2.2  Assigning  the  Data.  To  be  able  to  have  a  realistic  probabilistic  formulation  of  the  syndrome,  we  have 
to  be  veiy  careful  how  the  data  is  assigned.  For  one  thing,  we  must  have  a  sufficient  number  of  soldiers  with 
the  symptoms.  Another  worry  is  the  time  factor.  If  the  symptoms  and  diagnoses  are  associated  with  a 
particular  geographic  area,  we  have  to  be  sure  that  the  unit  (or  units  comprising  a  larger  imit)  has  been  at  this 
geographic  location  long  enough,  so  that  whatever  caused  the  symptoms  had  enough  time  to  do  it.  In  other 
words,  we  have  to  have  “steady  state”  conditions. 

Consistent  with  this  approach,  we  now  describe  the  following  generic  situation.  Let  us  take  two  luirelated 
units  whose  paths  overlapped  in  a  particular  geographic  region  [1].  In  Figure  1,  the  grids  occupied  over  a 
long  period  of  time  are  shown  for  two  military  units;  for  simplicity,  they  are  denoted  here  as  UICl  and  inC2, 
respectively.  One  easily  sees  that  these  two  units  overlap  in  two  grids,  DC  and  CD.  Since  by  assumptions 
the  causes  from  both  sets  of  grids  contribute  the  same  symptoms  and  diagnoses,  and  since  both  units  spend 
a  very  long  time  in  their  respective  grids,  the  cause  and  effect  are  entirely  due  to  these  grids  for  these  units. 
Therefore,  to  simplify  matters,  we  treat  units  UICl  and  UIC2  as  one  compound  unit,  UlC-Compound,  as 
indicated  on  Figure  1. 
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Figure  1.  Grids  occupied  bv  the  compound  military  unit  flJIC-ComDOimd~)  composed  of  two  separate 
military  units  fUICl  and  UIC2). 

3.  PROBABILISTIC  FORMULATION 

Once  we  know  what  compound  unit  (UlC-Compound)  and  geographic  region  we  are  dealing  with,  we 
can  give  the  probabilistic  formulation  of  the  syndrome.  At  first,  we  shall  only  be  interested  with  the  number 
of  generalized  symptoms,  which,  as  we  recall,  actually  means  symptoms  plus  diagnoses.  Specifically,  we 
wish  to  record  the  number  of  people  who  have,  regardless  of  the  type,  one  generalized  symptom,  two 
generalized  symptoms,  and  so  on  up  to  six  generalized  symptoms.  For  example,  if  one  soldier  has  just  one 
symptom  ICDSYM  =  78999  but  the  other  has  ICDIAG  =  71940,  they  both  qualify  as  each  having  just  one 
generalized  symptom.  On  the  more  formal  level,  we  count  every  soldier  as  having  one  generalized  sjinptom 
whose  six-component  generalized  symptom/diagnosis  vectors  look  like  (|)  =  (sj,  0,  0;  0,  0,  0)  or  (J)  = 
(0, 0, 0;  d-^,  0, 0)  and  denote  their  number  as  wj.  Similarly,  with  two  generalized  symptoms,  we  count  every 
soldier  whose  six-component  generalized  symptom/diagnosis  vectors  look  like  (|)  =  (sj,  0;  0,  0,  0), 
<{)  =  (sj,  0,  0;  rfj,  0, 0),  (j)  =  (0, 0, 0;  d^,  d2, 0);  we  denote  the  number  of  soldiers  as  n2>  2md  so  on  all  the  way 
up  to  «6,  which  is  the  number  of  soldiers  whose  six-component  generalized  symptom/diagnosis  vectors  look 
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like  <j)  =  (ij,  S2,  ^3;  d^,  d2,  d^)  with  none  of  the  components  being  equal  to  zero  (of  course,  different  soldiers 
in  principle  will  have  different  sj's,  etc.).  Finally,  we  can  write  this  down  again  in  the  six-component  vector 
form  as 


n  =  («,,  nj,  «6)-  W 

Now  we  remember  that  we  have  introduced  the  continuous  generalized  symptom  variable  x,  which  for 
our  purposes  varies  between  0  and  6.  The  reason  we  want  this  continuous  generalized  symptom  variable  is 
that,  after  having  found  Wj,  n2, ...» we  shall  be  able  to  construct  analytical  soldier  probability  distribution 
functions  in  terms  of  the  continuous  generalized  symptom  variable  x.  Namely,  what  we  are  talking  about  here 
is  the  probability  that  a  soldier  selected  randomly  from  the  UlC-Compound  has  the  generalized  symptom 
number  x.  However,  to  coimect  to  a  real  situation,  we  have  to  relate  this  x  to  the  discrete  number. 

Now  the  minimum  and  maximum  numbers  of  generalized  symptoms  that  a  person  can  have  are, 
respectively,  1  and  6.  Consequently,  from  this  continuous  x,  we  define  the  discrete  number  of  symptoms,  say, 
Xj,  as  follows: 

1  i  X  ^  1.5  jCj  =  1;  1.5  <  X  ^  2.5  Xj  =  2;  2.5  <  x  ^  3.5  -*  X3  =  3; 

3.5  <  X  ^  4.5  -  X4  =  4;  4.5  <  X  i  5.5  -  X5  =  5;  5.5  <  X  :£  6  -  Xg  =  6.  (5) 

We  see  that  actually  x,-  =  /;  /  =  1, 2, ...,  6.  So,  for  example,  if  x  =  2.3,  we  say  that  the  number  of  generalized 
symptoms  a  randomly  chosen  soldier  has  is  two. 

Now  we  are  ready  to  construct  the  probability  distribution  function  P(x;  /ij,  M2, ...,  «^ ),  telling  us  the 
probability  that  a  randomly  chosen  soldier  has  a  number  of  generalized  symptoms  x;  in  a  more  precise 
language,  what  one  has  here  is  that  P(x;  Mj,  M2, ...,  rifjdx  is  the  probability  that  a  randomly  chosen  soldier  has 
a  generalized  symptom  number  between  x  and  (x  +  dx).  We  will,  however,  use  the  "sloppy"  language  that 
P(x;  Mj,  M|2, ....  Mgj  is  simply  the  "probability  of  getting  x"  for  a  randomly  chosen  soldier.  Next  comes  the 
important  question  of  normalization  of  P(x;  M|,  M2, ...,  n^.  While  tire  physical  domain  of  x  is  clearly  between 
0  and  6,  we,  unfortunately,  have  to  settle  to  be  just  between  1  and  6.  The  reason  for  this  is  that  there  will  be 
no  records  of  soldiers,  who,  although  feeling  the  "effects"  of  the  some  illness,  under  examination  showed  zero 
generalized  symptoms.  Hence,  our  probability  distribution  function  will  be  defined  for  x  satisfying  1  ^  x  ^  6 
with  the  normalization: 
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(6) 


6 

J  P{X-,  «],  Mj,  n^dx  =  1 . 

1 

Finally,  we  are  ready  to  construct  P(x;  n^,  ).  To  do  that,  we  use  the  finite  element  method, 

whose  rather  rigorous  exposition  can  be  found  in  [2],  while  a  much  simpler  exposition,  which  we  follow  here, 
is  in  [3]. 

Here  the  continuous  generalized  symptom  variable  jc  is  a  one-dimensional  variable.  The  nodal  points, 
which  are  needed  to  obtain  the  interpolation  functions  for  constructing  P(x;  «j,  •••>  »%)>  ^re  then  simply 

where  x  takes  the  physical  discrete  values: 

x,  = /, /=  1, 2, ...,  6.  (7) 

These,  in  essence,  define  the  one-dimensional  equidistant  six-nodes  fifth  power  element  [2, 3]  with  which 
are  associated  six  interpolation  functions  and  which  we  will  derive.  For  their  derivation,  we  need  the  six- 
component  polynomial  basis  vector  [2,  3]: 

b(x)  =  (1,  X,  x^),  (8) 

whose  form  tells  us  why  this  element  is  the  fifth  power  element.  With  the  help  of  (8),  in  the  "vector  of  the 
vector"  form,  we  next  write  down  the  6  x  6  matrix  m 

Kx) 
bix^ 
bix^) 
bix) 
b(x^) 
b(x^) 
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whose  detailed  expression  one  can  easily  obtain  from  (9),  (8),  and  (7).  However,  to  obtain  the  interpolation 
functions,  we  actually  need  the  inverse  of  this  matrix.  With  the  help  of  Mathematica  [4],  for  example,  we 
obtain: 

'  6  -15  20  -15  6  -1  ^ 

-87/10  117/4  -127/3  33  -Till  137/60 

,  29/6  -461/24  31  -307/12  65/6  -15/8 

w'*  =  .  (10) 

“  -31/24  137/24  -121/12  107/12  -95/24  17/24 

1/6  -19/24  3/2  -17/12  2/3  -1/8 

^-1/120  1/24  -1/12  1/12  -1/24  1/120; 


According  to  the  finite  element  method  [2,  3],  the  six  interpolation  functions,  Mi(x),  U2{x\  u^{x),  are 

obtained  in  the  vector  form  from  contracting  the  polynomial  basis  vector  b(x)  with  nf  ^ : 

u(x)  =  (u^(x),  U2(x), ....  u^(x))  =  b(x).m'  (1 1) 

yielding  specifically. 


(12a) 

(12b) 

(12c) 

(12d) 

(12e) 

(12f) 
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With  the  help  of  Mathematica  [4]  or  directly  with  some  hard  labor,  one  can  see  that 


M4(x)  =  u^{-x  +  7),  u^{x)  =  Uii-x  +  7),  u^(x)  -  u^^-x  +  7);  (13) 

as  a  consequence,  on  Figures  2, 3,  and  4,  we  show,  respectively,  only  Mi(x),  U2(x),  and  u^ix).  One  should  note 
that  each  u/x)  has  the  largest  (positive)  value  at  its  node  x^  =  i  going  through  zeros  at  other  nodes  and 
changing  signs  with  smaller  values  in  between.  As  such,  they  are  poised  to  be  continuous  approximation  for 
histograms  centered  at  jc,-  =  /,  whose  widths  are  given  as 


“  *1’  +  i  ^6  -  ^6  ~  (^5  (14) 


where  relation  (5)  was  taken  into  account. 

Furthermore,  the  functions  Ui(x)  also  satisfy 

E  =  1.  (15) 

i  =  1 


Defining 


6 

li  =  j  ulx)dK,  i  =  1,  2,  ...,  6, 
1 


(16) 


we  write  down  the  results  in  the  vector  form; 


/  =  (/j  =  0.32986, 4  =  1 .30208,  /g  =  0.86806,  ^  =  /g,  /g,  =  /j,  4  =  /i),  (17) 

where  we  notice  that  (17)  and  (13)  are  consistent  with  each  other.  Because  of  (15),  we  clearly  have 
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11 


U^i-x  +  l) 


Figure  4.  Interpolation  function  associated  with  the  third  (and  fourth)  nodal  point. 


(18) 


Now  it  is  easily  seen  that  the  normalized  probability  distribution  function  P(x;  Wj,  «2, Wg)  is  given  as 
a  ratio  of  u(x).n  to  /.«; 


P(jc);  «2, 


«6  = 


_  n.u(x)  _  i  =  1 


nJ 


y '  1 


(19a,b) 
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So  the  only  thing  we  need  to  know  now  are  the  number  of  soldiers,  that  are  associated  with  the  number 
of  generalized  symptoms,  /  =  1 ,  2, ...,  6,  and  we  know  the  probability  for  a  randomly  chosen  soldier  of  having 
X  generalized  symptoms. 

A  word  of  caution.  The  number  of  people,  that  are  associated  with  the  number  of  generalized 
symptoms,  /  =  1, 2, ...,  6,  may  redefine  the  range  of  the  continuous  generalized  symptoms.  Namely,  suppose 
that  =  M2  =  0  and  the  other  n's  are  different  fi’om  zero.  Then  the  range  of  x  rather  than  being  fi’om  1  to  6 
will  be  only  over  AX3,  AX4,  Axj,  and  Axg  combined  range.  The  distribution  function  is  obtained  fi'om  (19b) 
by  setting  «]  =  «2  “  distribution  function  is  now  approximately  normalized  over  the  new 

combined  range.  The  reason  for  this  is  that,  numerically,  the  Ij  are  almost  independent  of  the  range  of 
integration.  For  example,  if  we  define 

i,'  =  j  u^{x)dx,  i  =  1,  2,  ...,  6, 

Lx, 

where  Ax,-  are  defined  by  (14),  by  direct  numerical  comparisons,  one  deduces  that  1^  s  Hence,  it  appears 

quite  plausible  that  enlarging  the  range  of  integration  will  not  make  much  difference.  At  this  time  we  do  not, 
however,  discuss  the  probability  functions  with  "holes,"  i.e.,  when  some  in  file  middle  of  the  whole  range 
are  equal  to  zero. 

However,  establishing  the  probability  distribution  function  is  not  the  end  of  the  stoiy.  A  quantity  that 
potentially  has  veiy  good  use  is  the  generalized  symptom  density  xsy(x;  n^,  «2»  «$)  and  is  defined  as 

xsy(x;  n,,  n2.  «6) ""  •••»  ”6)-  (20a) 


Now,  directly  from  (19b),  we  then  have, 


x;9<x;  «j,  «2. 


6 


Yi  »iXU,(x) 

i  =  1 


(20b) 
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We  will  use  this  density  to  divide  people  into  very  ill  and  less  ill  later  on.  However,  the  average  number  of 
generalized  symptoms  over  all  randomly  chosen  people  is  simply  the  integral  over  (17b): 


<xsy(x; 


6 

n^)>  =  j  dx  xsy(x;  n^, 


(21) 


where 


6 

=  j*  dxxuj(x). 
1 


(22) 


As  we  see,  two  of  the  quantities  that  we  have  to  know  are  Jj  and  which  are  easily  calculable.  Next,  in  the 

six-component  vector  form,  we  list  the  Jj  from  (22): 

J=  (Jj  =  0.46627,  J2  =  1.92212,  =  3.96825,  J4  =  2.10814,  =  7.19246,  =  1.84276).  (23) 


Again,  combining  (23)  and  (15),  we  obtain  the  sum  rule 


(24) 


One  easily  verifies  the  correctness  of  relations  (19)  and  (21)  by  assuming  that  Wj  =  «2  ”  -  “  "6  = 

that  P(x;  n,  n, ..., «)  =  1/5  and  <  xsy(x;  n,  «,...,)>  =  7/2,  i.e.,  results  indicate  that  any  will  occur  with  equal 

probability,  as  it  should. 

From  our  discussion  after  relation  (19)  we  remember  that,  for  example,  if  =  «2  “  0  siid  the  other  n's 

are  different  from  zero,  then  the  range  of  x  rather  than  being  from  1  to  6  will  be  only  over  AX3,  Ax^,  Axj,  and 

/ 

Axg  combined  range.  There  we  argued  that  since  approximately  ®  the  distribution  fimction  is  obtained 
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from  (19b)  by  setting  «i  =  «2  “  However,  the  situation  here  is  a  little  bit  different  since  <  xsy(x\  0,  0,  n^, 
«4.  ”5)  ^  “Evolves  also  the  J,-.  Defining  again 

j't  =  ^  X  ufx)dx,  i  =  1,  2 . 6, 

Ax, 


we  again  see  that  numerically,  although  not  perfect,  we  still  can  write  ^  Jj .  Consequently,  even  in  the 
new  smaller  range,  we  can  use  the  expression  (21)  when  evaluating  <  xsy(x',  0, 0,  n^,  n^, Wg)  >. 

We  essentially  have  what  is  needed  to  do  some  numerical  exercises. 

Application  Examples.  The  idea  here  is  very  simple.  From  the  derived  probability  distribution  frmctions, 
we  compare  statistical  properties  of  different  UlC-Compounds  occupying  different  geographic  regions.  If 
these  statistical  properties  are  very  much  the  same  for  different  UlC-Compounds  at  their  respective 
geographic  areas,  then  the  geographies  and  what  is  on  them  have  very  little  to  do  with  the  syndrome.  On  the 
other  hand,  if  these  statistical  properties  are  very  different  at  different  geographic  sites,  fiie  geographies  and 
what  is  on  them  play  an  important  role  for  each  of  these  UlC-Compoimd's  syndromes.  We  should  remember, 
however,  that  the  assumption  of  sufficiently  long  time  exposures  to  the  geographies,  we  made  at  the 
begiiming,  should  hold. 

So  we  assume  a  particular  UlC-Compound  associated  with  a  particular  geography.  From  construction 
of  the  generalized  symptom  vector  for  each  ill  soldier,  as  described  in  section  2,  we  obtain  the  components 
of  the  vector  n.  Here,  as  an  example,  we  assume  to  be  of  the  form: 

n  =  (80,  50,  40, 30, 20,  5),  (25) 

that  is,  «j  =  80  is  the  number  of  people  whose  generalized  symptom/diagnosis  vector  looks  like 
<|)  =  (sj,  0,  0;  0,  0,  0)  or  <|)  =  (0,  Q;  d  ,  0,  0);  etc.  Now,  the  probability  distribution  function 
P(x',  80,  50,  40,  30,  20,  5)  can  be  written  explicitly  from  relations  (19b)  and  (17).  Its  form  is  shown  on 
Figure  5.  We  see  that  the  choice  of  the  six-component  n-vector  is  rather  reasonable,  since  from  relation  (21) 
we  have  that  the  average  number  of  generalized  symptoms  is  numerically  <  xsy(x-,  80,  50, 40, 30, 20,  5)  >  = 
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— ,  2  3  4  5  6 

Figure  5.  Example  plot  of  the  probability  distribution  function  vs.  the  continuous  generalized  symptom 
variable  x. 


2.826,  which,  according  to  our  assignments,  means  that  a  randomly  chosen  person  would  have  about  three 
generalized  symptoms. 

Relation  (20)  yields  the  generalized  symptom  density  xsy(x;  80,  50, 40,  30, 20,  5)  itself.  Figure  6  shows 
its  form.  The  curve  has  a  maximum  at  about  3.4.  The  significance  of  x^  is  that  it  divides  people  into 
two  categories,  i.e.,  those  with  x  <  x^,  which,  by  definition,  are  just  ill,  and  those  with  x>  x^  ,  which,  by 
definition,  are  very  ill.  These  definitions  allow  us  to  deduce  the  syndrome  itself  by  restricting  ourselves  to 
just  veiy  ill  people.  Hence,  in  this  example,  we  pull  from  the  database  only  those  people  who  have  4,  5,  and 
6  generalized  symptoms.  Indiscriminately,  these  generalized  symptoms  are  identified  according  to  the  ICD-9- 
CM  codes,  as  described  in  section  2.1.  The  program  should  be  written  which  ranks  them  according  to  the 
frequency  of  their  occurrence,  however,  now  separately  for  ordinaiy  symptoms  and  separately  for  diagnoses. 
Combining  the  most  frequent  symptoms  and  diagnoses,  say,  three  from  each  (these  numbers  are  not  fixed; 
we  could  easily  take  four  from  each,  etc.),  we  can  deduce  the  syndrome  either  from  the  NIH  sets  of 
syndromes,  as  described  in  section  2.1,  or  simply  by  giving  our  most  frequent  symptoms  and  diagnoses  to 
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xsy{x\  80,  50.  40,  30.  20.  5) 


Figure  6.  Example  plot  of  the  generalized  symptom  density  vs.  the  continuous  generalized  symptom 
variable  x. 


a  team  of  competent  and  independent  physicians  who  should  decide  on  the  causes  (syndrome),  or  both.  The 
result  should  be  a  syndrome  associated  with  our  particular  probability  distribution  function. 

Now  going  from  one  geographic  area  with  a  UlC-Compound  to  another  geographic  area  with  a  different 
UlC-Compound,  we  should  be  able  to  make  a  direct  comparison  between  the  probability  distribution 
functions,  the  generalized  symptom  densities,  the  average  numbers  of  generalized  symptoms,  and  the  x^'s. 
If  these  sets  of  quantities  do  not  differ  very  much  for  the  two  geogr£q)hies,  then  the  syndrome  at  both  places 
is  the  same  and  geographies  have  very  little  to  do  widi  it.  If,  however,  they  do  differ  significantly  for  the  two 
geographies,  then  the  geographies  tiiemselves  should  hold  the  key  as  to  why  people  at  one  place  are  sick  in 
one  way  and  at  the  other  place  the  other  way. 

4.  DISCUSSION  AND  CONCLUSION 

It  is  too  late  to  try  to  apply  the  probabilistic  methods  described  here  to  the  Gulf  War  Syndrome  question. 
However,  in  the  future,  we  believe  this  method  should  be  able  to  answer  rather  straightforwardly  whether  the 
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constructed  probability  distribution  functions  for  every  UIC  or  UlC-Compound  look  alike  or  not,  i.e.,  whether 
the  geography  and  the  details  of  the  battlefield  environment  have  something  to  do  with  the  syndrome  or  not. 
Of  course,  if  the  probability  distribution  functions  are  all  very  much  alike,  then  the  cause  of  the  syndrome 
is  something  that  the  UICs  and  UlC-Compounds  did  to  themselves.  Finally,  if  desired,  for  each  probability 
distribution  function,  one  can  derive  the  syndrome  with  the  help  of  generalized  sjmiptom  density,  and  the 
database,  as  described  in  the  previous  section. 
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